AITopics | Monagas State

Collaborating Authors

Monagas State

World Modeling with Probabilistic Structure Integration

Kotar, Klemen, Lee, Wanhee, Venkatesh, Rahul, Chen, Honglin, Bear, Daniel, Watrous, Jared, Kim, Simon, Aw, Khai Loong, Chen, Lilian Naing, Stojanov, Stefan, Feigelis, Kevin, Thobani, Imran, Durango, Alex, Jedoui, Khaled, Kazemian, Atlas, Yamins, Dan

arXiv.org Artificial IntelligenceSep-15-2025

We present Probabilistic Structure Integration (PSI), a system for learning richly controllable and flexibly promptable world models from data. PSI consists of a three-step cycle. The first step, Probabilistic prediction, involves building a probabilistic graphical model Psi of the data, in the form of a random-access autoregressive sequence model. Psi supports a complete set of learned conditional distributions describing the dependence of any variables in the data on any other set of variables. In step 2, Structure extraction, we show how to extract underlying low-dimensional properties in the data, corresponding to a diverse set of meaningful "intermediate structures", in a zero-shot fashion via causal inference on Psi. Step 3, Integration, completes the cycle by converting these structures into new token types that are then continually mixed back into the training diet as conditioning signals and prediction targets. Each such cycle augments the capabilities of Psi, both allowing it to model the underlying data better, and creating new control handles -- akin to an LLM-like universal prompting language. We train an instance of Psi on 1.4 trillion tokens of internet video data; we use it to perform a variety of useful video prediction and understanding inferences; we extract state-of-the-art optical flow, self-supervised depth and object segmentation; and we use these structures to support a full cycle of predictive improvements.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.09737

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Venezuela > Monagas State > Maturin (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Workflow (0.89)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(4 more...)

Add feedback

Leveraging Approximate Caching for Faster Retrieval-Augmented Generation

Bergman, Shai, Ji, Zhang, Kermarrec, Anne-Marie, Petrescu, Diana, Pires, Rafael, Randl, Mathis, de Vos, Martijn

arXiv.org Artificial IntelligenceMar-7-2025

Retrieval-augmented generation (RAG) enhances the reliability of large language model (LLM) answers by integrating external knowledge. However, RAG increases the end-to-end inference time since looking for relevant documents from large vector databases is computationally expensive. To address this, we introduce Proximity, an approximate key-value cache that optimizes the RAG workflow by leveraging similarities in user queries. Instead of treating each query independently, Proximity reuses previously retrieved documents when similar queries appear, reducing reliance on expensive vector database lookups. We evaluate Proximity on the MMLU and MedRAG benchmarks, demonstrating that it significantly improves retrieval efficiency while maintaining response accuracy. Proximity reduces retrieval latency by up to 59% while maintaining accuracy and lowers the computational burden on the vector database. We also experiment with different similarity thresholds and quantify the trade-off between speed and recall. Our work shows that approximate caching is a viable and effective strategy for optimizing RAG-based systems.

cache, latency, query, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3721146.3721941

2503.0553

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Netherlands > South Holland > Rotterdam (0.05)
Europe > Switzerland > Vaud > Lausanne (0.05)
(4 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Adapting Large Language Models via Reading Comprehension

Cheng, Daixuan, Huang, Shaohan, Wei, Furu

arXiv.org Artificial IntelligenceSep-18-2023

We explore how continued pre-training on domain-specific corpora influences large language models, revealing that training on the raw corpora endows the model with domain knowledge, but drastically hurts its prompting ability for question answering. Taken inspiration from human learning via reading comprehension--practice after reading improves the ability to answer questions based on the learned knowledge--we propose a simple method for transforming raw corpora into reading comprehension texts. Each raw text is enriched with a series of tasks related to its content. Our method, highly scalable and applicable to any pre-training corpora, consistently enhances performance across various tasks in three different domains: biomedicine, finance, and law. Notably, our 7B language model achieves competitive performance with domain-specific models of much larger scales, such as BloombergGPT-50B. Furthermore, we demonstrate that domain-specific reading comprehension texts can improve the model's performance even on general benchmarks, showing the potential to develop a general model across even more domains. Our model, code, and data will be available at https://github.com/microsoft/LMOps.

knowledge, language model, reading comprehension text, (15 more...)

arXiv.org Artificial Intelligence

2309.0953

Country:

North America > United States > New York > New York County > New York City (0.04)
South America > Venezuela > Monagas State > Maturin (0.04)
North America > United States > Illinois (0.04)

Genre: Research Report (0.82)

Industry:

Law (1.00)
Education > Assessment & Standards > Student Performance (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language

Parida, Shantipriya, Abdulmumin, Idris, Muhammad, Shamsuddeen Hassan, Bose, Aneesh, Kohli, Guneet Singh, Ahmad, Ibrahim Said, Kotwal, Ketan, Sarkar, Sayan Deb, Bojar, Ondřej, Kakudi, Habeebah Adamu

arXiv.org Artificial IntelligenceMay-28-2023

This paper presents HaVQA, the first multimodal dataset for visual question-answering (VQA) tasks in the Hausa language. The dataset was created by manually translating 6,022 English question-answer pairs, which are associated with 1,555 unique images from the Visual Genome dataset. As a result, the dataset provides 12,044 gold standard English-Hausa parallel sentences that were translated in a fashion that guarantees their semantic match with the corresponding visual information. We conducted several baseline experiments on the dataset, including visual question answering, visual question elicitation, text-only and multimodal machine translation.

artificial intelligence, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2305.1769

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Africa > Nigeria > Jigawa State > Dutse (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(32 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.93)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

The Venezuelans Trying to Escape Their Country Through Video Game Grunt Work

SlateAug-25-2021, 13:00:00 GMT

On a recent afternoon in Maracaibo, Venezuela, Alexander Marinez, who has short-cropped black hair and three-to-four-day stubble, sat in front of his computer tracking herbiboars in the mushroom forests on Fossil Island. He pressed down on his glowing mouse, the newest addition to his otherwise timeworn gaming setup. The pixelated character on his computer screen followed the tracks of a hedgehoglike creature with triangular tusks and herbs growing out of its back. Outside Marinez's one-story house, the sun bore down on the dirt road. His home lies about six miles away from the strait that connects the Caribbean Sea with Lake Maracaibo, one of the world's richest sources of oil. The character inspected a tunnel. Suddenly, the herbiboar appeared, and the character attacked, stunning it.

marinez, runescape, venezuela, (13 more...)

Slate

Country:

South America > Venezuela > Zulia State > Maracaibo (0.46)
Atlantic Ocean > Caribbean Sea (0.25)
South America > Venezuela > Lake Maracaibo (0.24)
(13 more...)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Government (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Communications (0.95)
Information Technology > Artificial Intelligence > Games (0.51)

Add feedback

Pull out all the stops: Textual analysis via punctuation sequences

Darmon, Alexandra N. M., Bazzi, Marya, Howison, Sam D., Porter, Mason A.

arXiv.org Artificial IntelligenceJan-16-2020

Whether enjoying the lucid prose of a favorite author or slogging through some other writer's cumbersome, heavy-set prattle (full of parentheses, em dashes, compound adjectives, and Oxford commas), readers will notice stylistic signatures not only in word choice and grammar, but also in punctuation itself. Indeed, visual sequences of punctuation from different authors produce marvelously different (and visually striking) sequences. Punctuation is a largely overlooked stylistic feature in "stylometry", the quantitative analysis of written text. In this paper, we examine punctuation sequences in a corpus of literary documents and ask the following questions: Are the properties of such sequences a distinctive feature of different authors? Is it possible to distinguish literary genres based on their punctuation sequences? Do the punctuation styles of authors evolve over time? Are we on to something interesting in trying to do stylometry without words, or are we full of sound and fury (signifying nothing)?

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1017/S0956792520000157

1901.00519

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.27)
North America > United States > New York > New York County > New York City (0.14)
(16 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Government > Regional Government (0.45)
Media > Music (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Information Management (0.68)
(2 more...)

Add feedback

1000 novels everyone must read: Science Fiction & Fantasy (part two)

AITopics Original LinksJan-18-2017, 11:30:30 GMT

When Haldeman returned from Vietnam, with a Purple Heart for the wounds he had suffered, he wrote a story about a pointless conflict that seems as if it will never end. It was set in space, and the enemies were aliens, but 18 publishers decided it was too close to home before St Martin's Press took a gamble. The book that "nobody wants to read" went on to win many prizes. It's not perfect - it's hard to take seriously a future in which hetereosexuality is a perversion - but the anti-war message is as powerful as ever. Known for his intricate short stories and critically acclaimed mountaineering novel Climbers, Harrison cut his teeth on SF. In typical fashion, he writes space opera better than many who write only in the genre. For all its star travel and alien artefacts, scuzzy 25th-century spaceports and drop-out space pilots, Light is actually about twisting three plotlines as near as possible to snapping point. This is as close as SF gets to literary fiction, and literary fiction gets to SF. Jon Courtenay Grimwood Buy this book at the Guardian bookshop Amateur stonemason, waterbed designer, reformed socialist, nudist, militarist and McCarthyite, Heinlein is one of the most interesting and irritating figures in American science fiction.

artificial intelligence, guardian bookshop, science fiction, (14 more...)

AITopics Original Links

Country:

Asia > Vietnam (0.24)
Europe > Russia (0.04)
Europe > Ireland (0.04)
(10 more...)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Government (1.00)

Technology: Information Technology > Artificial Intelligence > Science Fiction (0.61)

Add feedback